Lahar: Warehousing Markovian Streams
نویسندگان
چکیده
Lahar: Warehousing Markovian Streams Julia Maureen Letchner Chair of the Supervisory Committee: Professor Magdalena Balazinska Computer Science and Engineering A huge amount of the world’s data is both sequential and low-level. Many applications consume higher-level information, such as words and sentences, that is inferred from low-level sequences such as raw audio signals using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level streams that are imprecise. These imprecise streams, once archived, are useful for analytics support including sequence-finding event queries (e.g. “Find all times when the phrase ‘Barack Obama...veto’ occurs in the NPR news podcast from July 9.”), event query aggregates (e.g. “How many times do 2008 NPR podcasts use the phrase ‘Barack Obama...veto’?”), and event query lineage (e.g. “What words appeared between the word ‘Obama’ and ‘veto’ in the previous query?”). These queries are difficult to support efficiently because archives can be large, and standard relational warehouses cannot support analytics on the rich semantics of imprecise sequences; however, these analytics are critical for allowing applications to effectively leverage this data. In this thesis, we introduce Lahar, the first database system for a common type of imprecise, sequential model called a Markovian stream. Lahar includes novel algorithms for efficiently processing aggregated event queries, and event query lineage. Lahar accelerates performance and scalability of all queries using several techniques, including a set of novel Markovian stream indices and novel methods for approximating Markovian streams. Through experiments on two real-world datasets (one collected from an office-building RFID deployment and the other collected from audio podcasts) we demonstrate that Lahar is an efficient Markovian stream warehousing system.
منابع مشابه
Lahar Demonstration: Warehousing Markovian Streams
Lahar is a warehousing system for Markovian streams—a common class of uncertain data streams produced via inference on probabilistic models. Example Markovian streams include text inferred from speech, location streams inferred from GPS or RFID readings, and human activity streams inferred from sensor data. Lahar supports OLAP-style queries on Markovian stream archives by leveraging novel appro...
متن کاملTowards Real-Time Data Stream Processing
Many applications require the continuous tracking of the state of a system in order to detect the occurrence of a particular event. RFID sensors, in particular, have become an increasingly popular means of gathering tracking information about the objects of interest. The need to query these data has spurred research at the intersection of sensor networks and databases. There are a number of cha...
متن کاملApproximation Trade-Offs in a Markovian Stream Warehouse: An Empirical Study UW TR: #UW-CSE-09-07-03
A large amount of the world’s data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these ...
متن کاملApproximation trade-offs in a Markovian stream warehouse: An empirical study
A large amount of the world’s data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these ...
متن کاملData Stream Warehousing In Tidalrace
Big data is a ubiquitous feature of large modern enterprises. Many organizations generate huge amounts of on-line streaming data – examples include network monitoring, Twitter feeds, financial data, and industrial application monitoring. Making effective use of these data streams can be challenging. While Data Stream Management Systems can provide support for realtime alerting and data reductio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010